Efficient pitch-based estimation of VTLN warp factors

نویسندگان

  • Arlo Faria
  • David Gelbart
چکیده

To reduce inter-speaker variability, vocal tract length normalization (VTLN) is commonly used to transform acoustic features for automatic speech recognition (ASR). The warp factors used in this process are usually derived by maximum likelihood (ML) estimation, involving an exhaustive search over possible values. We describe an alternative approach: exploit the correlation between a speaker’s average pitch and vocal tract length, and model the probability distribution of warp factors conditioned on pitch observations. This can be used directly for warp factor estimation, or as a smoothing prior in combination with ML estimates. Pitch-based warp factor estimation for VTLN is effective and requires relatively little memory and computation. Such an approach is well-suited for environments with constrained resources, or where pitch is already being computed for other purposes.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Pitch-based Estimation o

To reduce inter-speaker variability, vocal tract length normalization (VTLN) is commonly used to transform acoustic features for automatic speech recognition (ASR). The warp factors used in this process are usually derived by maximum likelihood (ML) estimation, involving an exhaustive search over possible values. We describe an alternative approach: exploit the correlation between a speaker’s a...

متن کامل

Using VTLN for broadcast news transcription

Vocal tract length normalisation (VTLN) is a commonly used speaker normalisation approach. It is attractive compared to many normalisation schemes as it is typically dependent on only a single parameter, allowing the warp factors to be robustly calculated on little data. However, the scheme normally requires explicitly coding the data at multiple warp factors. Furthermore, it is only possible t...

متن کامل

A computationally efficient approach to warp factor estimation in VTLN using EM algorithm and sufficient statistics

In this paper, we develop a computationally efficient approach for warp factor estimation in Vocal Tract Length Normalization (VTLN). Recently we have shown that warped features can be obtained by a linear transformation of the unwarped features. Using the warp matrices we show that warp factor estimation can be efficiently performed in an EM framework. This can be done by collecting Sufficient...

متن کامل

Using VTLN matrices for rapid and computationally-efficient speaker adaptation with robustness to first-pass transcription errors

In this paper, we propose to combine the rapid adaptation capability of conventional Vocal Tract Length Normalization (VTLN) with the computational efficiency of transform-based adaptation such as MLLR or CMLLR. VTLN requires the estimation of only one parameter and is, therefore, most suited for the cases where there is little adaptation data (i.e. rapid adaptation). In contrast, transform-bas...

متن کامل

Efficient Speaker and Noise Normalization for Robust Speech Recognition

In this paper, we describe a computationally efficient approach for combining speaker and noise normalization techniques. In particular, we combine the simple yet effective Histogram Equalization (HEQ) for noise compensation with Vocal-tract length normalization (VTLN) for speaker-normalization. While it is intuitive to remove noise first and then perform VTLN, this is difficult since HEQ perfo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005